Keyphrase extraction through query performance prediction

نویسندگان

  • Gonenc Ercan
  • Ilyas Cicekli
چکیده

Previous research shows that keyphrases are useful tools in document retrieval and navigation. While these point to a relation between keyphrases and document retrieval performance, no other work uses this relationship to identify keyphrases of a given document. This work aims to establish a link between the problems of Query Performance Prediction (QPP) and keyphrase extraction. To this end, features used in QPP are evaluated in Keyphrase Extraction using a Naïve Bayes classifier. Our experiments indicate that these features improve keyphrase extraction effectiveness in documents of different length. More importantly, commonly used features of frequency and first position in text perform poorly on shorter documents, whereas QPP features are more robust and achieve better results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

Query-Oriented Keyphrase Extraction

People often issue informational queries to search engines to find out more about some entities or events. While a Wikipedia-like summary would be an ideal answer to such queries, not all queries have a corresponding Wikipedia entry. In this work we propose to study query-oriented keyphrase extraction, which can be used to assist search results summarization. We propose a general method for key...

متن کامل

How Document Pre-processing affects Keyphrase Extraction Performance

The SemEval-2010 benchmark dataset has brought renewed attention to the task of automatic keyphrase extraction. This dataset is made up of scientific articles that were automatically converted from PDF format to plain text and thus require careful preprocessing so that irrevelant spans of text do not negatively affect keyphrase extraction performance. In previous work, a wide range of document ...

متن کامل

Approximate Matching for Evaluating Keyphrase Extraction

We propose a new evaluation strategy for keyphrase extraction based on approximate keyphrase matching. It corresponds well with human judgments and is better suited to assess the performance of keyphrase extraction approaches. Additionally, we propose a generalized framework for comprehensive analysis of keyphrase extraction that subsumes most existing approaches, which allows for fair testing ...

متن کامل

Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction

In this paper we analyze the effectiveness of using linguistic knowledge from coreference and anaphora resolution for improving the performance for supervised keyphrase extraction. In order to verify the impact of these features, we define a baseline keyphrase extraction system and evaluate its performance on a standard dataset using different machine learning algorithms. Then, we consider new ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Information Science

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2012